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Abstract 

Tsallis' 'statistical thermodynamic' formulation of the nonadditive en- 
tropy of degree-a is neither correct nor self-consistent. 

It is well known that the maximum entropy formalism £Q, the minimum discrim- 
ination information and Gauss' principle 02] an lead to the same results 
when a certain condition on the prior probability distribution is imposed |S] . All 
these methods lead to the same form of the posterior probability distribution; 
namely, the exponential family of distributions. 

Tsallis and collaborators [S] have tried to adapt the maximum entropy for- 
malism that uses the Shannon entropy to one that uses a nonadditive entropy 
of degree-a. In order to come out with analytic expressions for the probabilities 
that maximize the nonadditive entropy they found it necessary to use 'escort 
probabilities' [7j of the same power as the nonadditive entropy. 

If the procedure they use is correct then it follows that Gauss' principle 
should give the same optimum probabilities. Yet, we will find that the Tsallis 
result requires that the prior probability distribution be given by the same 
unphysical condition as the maximum entropy formalism and, what is worse, the 
potential of the error law be required to vanish. The potential of the error law is 
what information theory refers to as the error (Hj; that is, the difference between 
the inaccuracy and the entropy. Unless the 'true' probability distribution, P = 
{p{x\),p{x2) ■ ■ ■ ,p(x m )) coincides with the estimated probability distribution, 
Q = (q(xi), q(x2), ■ ■ ■ q(x m )), the error does not vanish. Moreover, we shall show 
that two procedures of averaging, one using the escort probabilities explicitly, 
do not give the same result, and the relation between the potential of the error 
law and the nonadditive entropy requires the latter to vanish when the former 
vanishes. 

Let A be a random variable whose values X\, X2, ■ ■ ■ , x m are obtained at m 
independent trials. Prior to the observations the distribution is Q, and after 
the observations the unknown probability distribution is P. The observer has 
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at his disposal the statistic 

a = — Xi 
m * — ' 

i=l 

to help him formulate a guess as to the form of Q. Gauss' principle assumes 
that the probability distribution P depends on a parameter a 

m 

a = E(X) = J2 x iP (xi), (1) 

2=1 

such that the arithmetic mean, a, is the maximum likelihood estimate of a. 
Furthermore, P will depend upon the parameter a in such a way that there is 
a value a for which p(xi\ a?) — q(xi), the prior distribution. 
The maximum likelihood estimate, 

|-log£(a)=0, 
oa 

will lead to the exponential family of distributions when the log-likelihood func- 
tion 

m 

log£(a) = ^logp(x t ;a). 

»=i 

The likelihood equation 

d 

—ip{xi;a) = 0, 
where ^(xi; a) = \ogp(xi; a), is the same as requiring 

m 

Y ( x i - a) =0, 

i=l 

and any deviations in one will immediately lead to deviations in the other. 
Hence, they must be proportional to one another. Choosing the coefficient of 
proportionality, as the second derivative of some appropriate scalar function, V, 
gives [5] 

^iP{x i ;a) = V"(a)(x i -a), (2) 

where the prime stands for differentiation with respect to the argument. The 
scalar potential, V(a), must be independent of the X4 because the left-hand side 
is only a function of x% and a similar equation for Xj would lead to a contradic- 
tion. We assume that the potential is such that V(a°) — 0. Consequently, J5J 
can be rewritten as 

d d 

— a) — -— {V'(a)( Xi -a) + V(a)} . 
oa oa 

Integrating from ao to a gives 

if)(xi] a) = i>{x t \ a ) + X(a)(xi - a) + V(a), 
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where A(a) = V'(a). In the usual case that the log- likelihood function is loga- 
rithmic, we get an exponential family of distributions 

logp(a;,;a) = logg(^) + \{a){x l - a) + V(a) (3) 

Averaging both sides with respect to the probability distribution P gives 

m 

^2{p(xi;a) logp(xi;a) -p(x i ' ) a) \ogq(xi)} = V(a) 

i=l 

In information theory, the first term is the negative of the Shannon entropy, 
the second term is the inaccuracy, and the right hand side is the error [Sj. On 
the strength of Shannon's inequality, 

f>fea)logfei^W( a )>0 (4) 
i=1 \ Q\ x i) J 

the inaccuracy cannot be smaller than the Shannon entropy. Shannon's inequal- 
ity follows very simply from the arithmetic-geometric mean inequality, 

tci rn 

Y[xf Xl ' a} x l p(x i ;a) 

i=l i=l 

with Xi = q(xi)/p(xi;a). 

When Q is the uniform distribution, i.e., q(xi) — l/m Vi, Shannon's in- 
equality, becomes 

S (l/m)-S 1 {P)=V(a) 

which we have referred to as the entropy reduction caused by the application 
of a constraint that produces a finite value of a So(l/m) — logm is the 
maximum entropy, and it is known as the Hartley entropy in information theory. 
Classically, the entropy is defined to within a constant; only entropy differences 
are measurable. 

However, all that we have said so far does not apply to equilibrium thermody- 
namics 0]. If we average ® with respect to the Q distribution, instead of the P 
distribution, and use Shannon's inequality, Y^Li l( x i) l°g [l( x i)/p( x ii a )] ^ Oj 
we immediately run into a problem because V(a) must now be necessarily neg- 
ative. In statistical mechanics, q(xi) represents the surface of constant energy 
of a hypersphere of high dimensionality |l(Jj . Because of its high dimensionality, 
the volume of the hypersphere lies very close to its surface so that q(xi) can 
be thought of as the volume of phase space occupied by the system. Averages 
are performed with respect to this non-normalizable prior probability distribu- 
tion JUj. In order to keep the error V(a), which will soon be identified as the 
thermodynamic entropy, positive, it is necessary to introduce a sign change in 


This sign change can be rationalized in the following way. The exponential 
factor, e K a ) x i i w [i\ no t overpower the rapidly increasing factor of the density of 
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states, q(xi). However, the density of states cannot increase faster than a certain 
power of the radius, x% of the phase space volume, which is proportional to 
a;" 1 in a hypersphere of m-dimensions. What is needed is an even more rapidly 
decreasing exponential factor exp (— \xi). 

According to the Boltzmann-Planck interpretation, q(xi) is not a normalized 
probability, but, rather, a 'thermodynamic' probability, being proportional to 
the volume of phase space occupied by the system. The (random) entropy S(xi) 
is defined as the logarithm of the thermodynamic probability 

S(xi) = logg(xi) 

The phase average is given by 

TCI I 171 

E\X] = ]T xtqixi) / l( x i)- 

The thermodynamic entropy is the phase space average of log[q(xi) /p(xi; a)) 
viz., 

Q , \ _ J2T=i q{xi)\og[q(x l )/p(x i ;a)} 
Ei=l l{ x i) 

and its Legendre transform 

S(a)-X(a)a = logZ(X), 

defines the logarithm of the generating function, Z(X). The inaccuracy now 
appears as the difference between the thermodynamic entropy and the average 
of the random entropies 

m m i rn 

i=l i=l ' i=l 

The inequality follows from the facts that S increases in the wide sense and 
is concave. The expectation a can be taken either with respect to F or Q. The 
two averages must necessarily coincide for otherwise there would not be a single 
general thermodynamics, but rather a "microcanonical thermodynamics" and 
a separate "canonical thermodynamics" [TI]. Taken with respect to Q, (JSJ is 
Jensen's inequality for a concave function, where the Q has positive components 
but are otherwise arbitrary. Taken with respect to the normalized P, JHJ) is the 
Jensen-Petrovic inequality |12| . where Et=i Pi( x i) - ' x j f° r each j = 1, . . . , m. 
The average of m variables is likely to be considerably greater than any of its 
components. Then, if S is increasing, 

Xip(xi)^J > S(xj) 

for j = l,...,m. Multiplying by q(xj) and summing gives back This 
does not mean that S{xi)/xi should not decrease: A sufficient condition for 
Sly)-* Xi) < E»=i S(xi) is that S{xi)/xi should decrease. 
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That S(xi) is an increasing function and S(xi)/xi decreases, i.e., it is anti- 
star shaped, are the criteria for inequality attenuation [15]. Fluctuations give 
rise to inaccuracy JSJ), and, in their absence a function of the average is equal 
to an average of the function. 

Therefore, if the exponential probability distribution, @, is to coincide with 
Gauss' error law, written in terms of the concavity of the entropy, 

logpfo; a) = S{xi) - S'(a)( Xi -a)- S{a) = \S"{a){ Xi ~ a) 2 (6) 

where a lies between Xi and a, then sign changes are needed. When this is done 
|J3J becomes 

\ogp(xi;a) = logg(xi) - \(a)(x t - a) - V(a). (7) 

A comparison of © and J7J shows that the entropy, S(a), is the potential, 
V(a), that determines the error law py. The concavity of the entropy ensures 
that the exponent will be negative and hence p{xi\a) will be less than unity. 
The parameter, A(a), is still the derivative of the scalar potential, V(a), but 
since this potential now coincides with the thermodynamic entropy, S(a), the 
Lagrange multiplier A(a) is now identified as the internal variable in the entropy 
representation. 

Information theoretic entropies, and the entropy reduction of the thermo- 
dynamics of extremes UJ, are not amenable to the previous thermodynamic 
interpretation, where the entropy is defined as the logarithm of the volume of 
phase space occupied by the system. Since all the volume lies very near of the 
surface in a thermodynamic system of high dimensionality, the volume of phase 
space will coincide with the surface area, which is referred to as the structure 
function JUj. 1 Rather, we consider the P and Q as two sets of complete prob- 
ability distributions. For a given probability distribution, Q, we seek the set of 
probability P which most closely resemble Q. This is the minimum discrimina- 
tion statistic of Kullback |2] • 

In order to derive the nonadditive entropies of degree-a, the logarithm is 
replaced by the well-known elementary limit 

, / \ P a ~ 1 (x l ;a) - 1 

\ogp{xi;a) -> = ip(Xi;a), 

a — 1 

and a similar relation for logg(xi) in the exponential law we get 

p a ~\x t ; a) - q a -\ Xi ) = _ a) + 
a — 1 

1 In what turned out to be a futile attempt to justify Tsallis' formalism, Plastino and 
Plastino 1141 considered a structure function for the energy of the form £!" . Assuming 
a bounded phase space — for no given reason — whose total energy is Eq, they identified the 
Tsallis exponent as a = (m— 2)/(m— 1), and, at the same time, defined the inverse temperature 
as = (m — l)/Eo- What they failed to realize is that in order to define a temperature m 
must be much greater than 1 so that a = 1. More precisely m must be large enough to 
validate the use of Stirling's formula |4J- If the conditions under which they claim Tsallis' 
statistical mechanics applies, then it cannot be applied to thermodynamic systems for such 
systems would be far to small to be capable of defining intensive quantities like temperature 
and pressure. 



5 



Multiplying JSJl by p(xi\a), and summing give 



UQ) - s a (p) = XZiPi* i ;«)(f'- 1 {x i ;*)-<f- 1 (xi)) = v{a) > 

a — 1 

where 

a — 1 

has been referred to as the inaccurary |22], and 

S a (P)= 1 ~^= 1 f {Xi) (9) 

has been referred to as the Tsallis entropy |16| in the physical literature, but 
has been well known in information theory since the late 1960's ^| 1181 1191 
1201 1211 1221 • We will henceforth suppress the dependence of the probability 
distribution P on the average a, because Tsallis' statistical thermodynamics 
make no pretext at statistical inference. The inaccuracy is a convex function of 
Q, for a given P, provided a < 2. The inaccuracy is defined as the sum of the 
entropy of degree-a, 10, and the error, V(a). The inaccuracy has the property 
that 

lim I a (Q) = S a (l/m). 

q — >l/m 

The negative of the error is what we have called the entropy reduction, AS [5]. 

Now the inequality in (JSJ follows from Holder's inequality. Consider the case 
when all the P are rational; then they can be expressed in the form p(xi) = 
x%/Y2i=i Xi -> an d Q m the uniform distribution, q(xi) — 1/m Vi. Expression JHJ 
then becomes 



(a -I)- 1 | ^ i=1 1 

\(Ei=i *i) 

because of Holder's inequalities |2U 



V(a) 



m ( m \ a>1 

1 J2 Xi > ( 53 x i) m ~ 1/a for ^ < 1 



i=l 



Now, Tsallis and collaborators |Sj find that the maximization procedure of 
the nonadditive entropy © with respect to the constraint 

a a = E a (X) = %iP a {xi) / P*^' ( 10 ) 

2=1 ' 1=1 

using the escort probabilities 0, yields the stationary condition 

_ [1 - (I - a)X a (x^ a a )}^^ 

P{ l) ~ z a {\ a ) {U > 



G 



where 

/m 
i=l 

and A is the Lagrange multiplier for a constraint, (|10|) . The normalization 
condition of the p{xi) gives the partition function as 

m 

Z a {K) =£[1-(1- a)Aa(a* ~ a Q )] 1/(1 " a) . (13) 

i=l 

At best, (|11() can be considered as an implicit relation for the probabilities since 
<|12|) contains the probabilities explicitly through 1)12(1, 

In order to reduce Gauss' principle © to something that even vaguely looks 
like the 'optimal' probabilities l|ll|) that maximize the Tsallis entropy, ©, it is 
necessary to: 

1. assume that P is an incomplete distribution, 

2. set q(xi) = 1 Vi, and 

3. set V(a) = 0. 
We then obtain 

p(xj) = [l + (a-l)A a (x I -q a )] 1 /("- 1 ) 

Ynll P( X i) 2 a (X a ) 

where the partition function is given by l|13|) , and we used the escort probabilities 
(|10|) to define the parameter a, rather than the weighted average Q). 
Rather, if we take l|7|. and introduce the approximation 

; > iogp(Xi), 

1 — a 

and a similar expression for q{xi) we get 
p 1 - a (si)-g 1 - a (:ci) 



1 - a 



X a (xi - a a ) - V(a a ). (14) 



Setting q(xi) = 1 and V{a a ) = 0, and requiring the probability distribution P 
to be normalized result in (|11[> . or, equivalently, 

i_ a , \ _ 1 - (1 - a)X a (xi - a a ) 

P [Xi) ~ Z^(X a ) 

Multiplying both sides by p a (xi) and summing give |25| 
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provided \ a is given by the escort average, IjlUI) . Rather, if raise both sides of 
to the power a, sum, and rearrange, we get 

m m 

z«{k) Yl p a ^) = E t 1 - (! - q ) a «(^ - a «)] a/(1_a) ^ z m. 

i=l i=l 

The only difference between the two forms of averaging is that in the first case 
use has been made of the escort probability average, (jTU)l . Since the two results 
do not coincide, we conclude that there is something amiss with the escort 
probability average, (|TU|l . 

Moreover, if we take the a — s- 1 limit in l|14fl we obtain 

m 

i=i 

which is not the correct expression for the partition function even in the un- 
physical case of a density of states equal to unity. 

Finally, multiplying both sides of l|14|) by p a (xi ) , and summing, result in 

i-S£i *ixi)Wi)M'i)\ a = r{xi) V (a a ). 

l—l 

If we now set q(xi) = 1 Vz, we come out with 

m 
i=l 

This shows the correspondence between Shannon entropy and the potential V(a) 
in the a — » 1 limit, that was alluded to above in the thermodynamic formula- 
tion which takes into account a non-normalized prior probability distribution. 
However, we have set the prior probability distribution equal to unity, as in the 
maximum entropy method, and, furthermore, in order to derive the probability 
distribution from Gauss' law we had to assume that V is identically zero. 
Relation l|15fl would, consequently, require the vanishing of the nonadditive en- 
tropy, ©. 

Based on the foregoing results, we can only conclude that Tsallis' 'statistical 
thermodynamic' formulation of the nonadditive entropy of degree-a is neither 
correct nor self-consistent. 
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